Summary

Routing marine mammal (and other marine taxa) telemetry tracks around water has been an issue on the wish list of bio-loggers for many years now. A number of approaches have been pursued over the years. But, many have either required censoring/cherry-picking of data OR extensive computational time. Here, we rely on routing technologies that many of us use daily when we query our phones for directions from point A to point B. In brief, a network of nodes and edges are created that represent possible paths through water (and, around land) within the study area. Creation of this feature can take many hours. However, once the network is created, it does not need to be repeated for that study area. This graph is then used to determine the shortest path between two points through the network. This step is fast so hundreds of tracks can be re-routed around land in just a few minutes. To our knowledge, this network/graph approach has not been applied to the movement of marine animals. The approach we outline here (and implement within the ‘pathroutr’ package) is intended to be used in conjunction with a predicted track output from a movement model (e.g. from the R package crawl). To reduce computational time, we only focus on fixing the portions of the predicted track that cross land.

The package does rely on the availability of a PostgreSQL databse with both the PostGIS and pgRouting extensions installed. This database could be running locally or within your enterprise/academic IT infrastructure, or on a cloud service. While detailed knowledge of PostgreSQl, PostGIS, and pgRouting are not a requirement, the large size and computational requirements of the network means you will likely benefit from some investment in optimization of the server configuration (i.e., you might need a lot of memory! and more than a few CPUs) if your study are is large or includes complicated land structures.

library(tidyverse)
library(crawl)
library(ptolemy)
library(sf)
library(ggspatial)
library(mapview)
library(RPostgres)
library(here)

Installing PostgreSQL, PostGIS, and pgRouting

This approach relies on the pgRouting extension to PostgreSQL. You’ll need to have access to a properly configured database server. This initial example is based on a local installation on macOS. The easiest option on macOS is to install with homebrew

brew install postgresql
brew install postgis
brew install pgrouting

At this point, you’ll need to do some additional setup on your macOS machine to get postgres up and running. If you manage homebrew with the same user account as you do all your R work, then you can skip this first part (you can just run initdb and your database cluster will be created in the default location). Otherwise, you need to make sure the postgres database is initiated into a directory that you have full read/write access to.

mkdir ~/postgres
initdb -D /Users/$(whoami)/postgres
pg_ctl -D /Users/$(whoami)/postgres -l /Users/$(whoami)/logfile start
createdb
pg_ctl -D /Users/$(whoami)/postgres -l /Users/$(whoami)/logfile stop

The above steps initiated the database cluster and then calls pg_ctl to start the cluster. Then, createdb creates a database (the name of the database defaults to $(whoami), but you may want to give your database a name that is relevant to your study area). The last statement stops the server.

To make things a bit easier on ourselves, we’ll create a few aliases in our .bash_profile for the start and stop commands. Open up .bash_profile in an editor and add the following lines

alias pg_start='pg_ctl -D /Users/$(whoami)/postgres -l /Users/$(whoami)/logfile start'
alias pg_stop='pg_ctl -D /Users/$(whoami)/postgres -l /Users/$(whoami)/logfile stop'

Lastly, let’s start up our posgres server by typing pg_start in the terminal. Note, this will not start your server automatically after a restart. There are ways to make this an option and a few minutes searching the internet will likely guide you to success.

Now, you need to install the PostGIS and pgRouting extensions into your postgres database. We can do this from the command line. If you provided a name for your database, you’ll want to include it after the psql. This will not work if your brew install steps above were not successful.

psql
CREATE EXTENSION postgis;
CREATE EXTENSION pgrouting;
SELECT PostGIS_Version();
SELECT pgr_version();
exit

It is also a good idea to change some of the default configurations for your postgres/postgis server to optimize performance for some of the complicated spatial operations and queries. These are settings I chose for a relatively new Macbook Pro.

ALTER SYSTEM SET work_mem = '1500MB';
ALTER SYSTEM SET shared_buffers = '4GB';
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM SET max_parallel_workers_per_gather = 4;
ALTER SYSTEM SET synchronous_commit = 'off';
ALTER SYSTEM SET fsync = 'off';
ALTER SYSTEM SET random_page_cost = 1.1;
ALTER SYSTEM SET wal_buffers = '16MB';
ALTER SYSTEM SET effective_cache_size = '10GB';
ALTER SYSTEM SET checkpoint_segments = 1600;

SELECT pg_reload_conf();

Getting Started with Routing Seal Paths Around Land

First, we will load in a seal track from the crawl package we’ll use as our demonstration. While we’re at it, let’s also go ahead and load up our land barrier layer by pulling the Alaska data from the ptolemy package and crop it to our study area. The final sf::st_union() is important because we need land_barrier to be a multipolygon geometry type. Also, note we are transforming everything into *epsg:3338 projection.

data("harborSeal_sf")
harborSeal_sf <- harborSeal_sf %>% 
  st_transform(3338)

ak <- ptolemy::alaska()
## importGSHHS status:
## --> Pass 1: complete: 10988 bounding boxes within limits.
## --> Pass 2: complete.
## --> Clipping...
land_barrier <- ak %>%
  st_crop(harborSeal_sf %>% st_transform(3338)) %>%
  sf::st_collection_extract('POLYGON') %>%
  sf::st_cast('POLYGON') %>% 
  sf::st_union()

The next thing we need to do is connect our R session to the database so we can push up some data and run some commands. You’ll need the RPostgres() package installed. If you chose a different name for your database you’ll want to include it as an argument to the dbConnect() function (e.g. dbname = 'my_db')

con <- dbConnect(RPostgres::Postgres())

And, we’ll push up our land_barrier and harborSeal_sf objects. Since these are sf objects, the sf::st_write() function allows us to push them to the database while maintaining all of the spatial features.

sf::st_write(land_barrier, con, "land_barrier", overwrite = TRUE)
sf::st_write(harborSeal_sf, con, "harbor_seal", overwrite = TRUE)

Before we move on, let’s go ahead an plot our spatial objects with mapview().

mapview::mapview(list(land_barrier, harborSeal_sf))

Next, we’ll start passing along some SQL statements. These just make sure we drop any tables that might already exist. Be careful here. You’ll soon learn that creating the vis_graph table can take a very very very very long time. So, don’t inadvertently drop it unless you have the time to re-create it. The last step creates a spatial index for our land_barrier to, hopefully, speed up some of our spatial query operations.

dbExecute(con,"drop table if exists vis_graph")
dbExecute(con,"drop table if exists land_barrier_buffer")
dbExecute(con,"drop table if exists coastal_corridor")
dbExecute(con,"drop table if exists coastal_pts")
dbExecute(con,"drop table if exists vis_graph")

dbExecute(con,
"CREATE INDEX land_barier_gist 
ON land_barrier USING GIST (geom)")

Next, we are going to create a coastline buffer/corridor that will constrain our network graph.

Let’s first buffer our land_barrier table by 3 kilometers

dbExecute(con,
"create table land_barrier_buffer as
select ST_Union(ST_Buffer(land_barrier.geom,3000)) geometry
from land_barrier"
)

and, make sure we have a primary key established as well as a spatial index

dbExecute(con, 
"alter table land_barrier_buffer add column gid serial primary key"
)

dbExecute(con,
"CREATE INDEX land_barier_buffer_gist 
ON land_barrier_buffer USING GIST ( geometry )")

now, we create the coastal corridor where we will focus our network

dbExecute(con,
"create table coastal_corridor as
select ST_Difference(land_barrier_buffer.geometry,land_barrier.geom) geometry
from land_barrier, land_barrier_buffer"
)

and, then, we’ll create a grid of points within the coastal corridor at 1km spacing. You can choose a different grid resolution for your study area. The main cost is that more grid points means a larger network which adds to your initial network creation time and, later, the time it takes to find the shortest path.

dbExecute(con, 
"create table coastal_pts as 
select (ST_PixelAsCentroids(ST_AsRaster(geometry,1000.0,1000.0))).geom geometry 
from coastal_corridor"
)
dbExecute(con,
"alter table coastal_pts add column gid serial primary key"
)
dbExecute(con,
"CREATE INDEX coastal_pts_gist 
ON coastal_pts USING GIST ( geometry )")

Now, we are ready to start building our visiblity graph. First, we’ll create lines from every coastal point to every other coastal point. And, then, we’ll select out only those lines that fall completely within the coastal corridor. This is the key step that insures our network only includes routes that pass through water. This is also the step that takes the longest (in this example, ~4 hours).

dbExecute(con,
"create table vis_graph as
select id, source, target, ll.geometry the_geom from 
(select
  row_number() over () as id,
  p1.gid as source,
  p2.gid as target,
  st_makeline(p1.geometry, p2.geometry)::geometry(LINESTRING) as geometry
from coastal_pts p1
join coastal_pts p2 on p1.gid < p2.gid) ll,
coastal_corridor
where st_within(ll.geometry, coastal_corridor.geometry)"
)

dbExecute(con,
"alter table vis_graph add primary key (id)"
)

Now, we’ll create the topology for our network. This is our first use of the pgRouting extension. This step creates a vis_graph_vertices_pgr table that includes all of our network nodes. This step also creates spatial indexes

dbSendQuery(con,"select pgr_createTopology('vis_graph',0.01,clean:=true)")

At this point, we are ready to calculate shortest path distances through our network graph. to do this, we’ll create a custom psql function. You don’t really need to know what’s going on within this function. It is here to document the code for those who are interested (and, those who might offer suggestions for improvement).

-- DROP FUNCTION wrk_fromAtoB(text, text, integer, integer);

CREATE OR REPLACE FUNCTION wrk_fromAtoB(
    IN start_pt TEXT,
    IN end_pt TEXT,
    IN in_sid INTEGER,
    IN search_buffer INTEGER,
    OUT sid INTEGER,

    OUT geom geometry
)
RETURNS SETOF record AS
$BODY$
DECLARE
    final_query TEXT;
BEGIN

    final_query :=
        FORMAT( $$
            WITH
            dijkstra AS (
                SELECT *
                FROM pgr_dijkstra(
                    'SELECT
                      id,
                      source,
                      target,
                      1 as cost
                    FROM vis_graph
                    WHERE the_geom && ST_Expand(
                      (ST_Collect(ST_GeomFromEWKT(''%1$s''),ST_GeomFromEWKT(''%2$s'')) ) ,
                      ''%4$s'')'::text,
                    -- source
                    (SELECT id FROM vis_graph_vertices_pgr
                        ORDER BY the_geom <-> ST_GeomFromEWKT('%1$s') LIMIT 1),
                    -- target
                    (SELECT id FROM vis_graph_vertices_pgr
                        ORDER BY the_geom <-> ST_GeomFromEWKT('%2$s') LIMIT 1),
                        false)
            ),
            get_geom AS (
                SELECT dijkstra.*,
                  CASE
                    WHEN dijkstra.node = vis_graph.source THEN the_geom
                    ELSE ST_Reverse(the_geom)
                END AS route_geom
                FROM dijkstra JOIN vis_graph ON (edge = vis_graph.id)
                ORDER BY seq)
            SELECT
                %3$s sid,
                CASE
                  WHEN (SELECT id FROM vis_graph_vertices_pgr
                        ORDER BY the_geom <-> ST_GeomFromEWKT('%1$s') LIMIT 1) =
                       (SELECT id FROM vis_graph_vertices_pgr
                        ORDER BY the_geom <-> ST_GeomFromEWKT('%2$s') LIMIT 1) THEN
                    st_makeline(
                    ARRAY[ST_GeomFromEWKT('%1$s'),
                    (SELECT the_geom FROM vis_graph_vertices_pgr
                        ORDER BY the_geom <-> ST_GeomFromEWKT('%1$s') LIMIT 1),
                    ST_GeomFromEWKT('%2$s')])
                  ELSE
                    st_addpoint(
                    st_addpoint(
                    st_linemerge(st_collect(route_geom)),ST_GeomFromEWKT('%1$s'),0),
                    ST_GeomFromEWKT('%2$s'))
                END AS the_geom
            FROM get_geom JOIN vis_graph ON get_geom.edge = vis_graph.id;$$,
        start_pt, end_pt, in_sid, search_buffer); -- %1 to %3 of the FORMAT function
    -- RAISE notice '%', final_query;
    RETURN QUERY EXECUTE final_query;
END;
$BODY$
LANGUAGE 'plpgsql';

To load this function into your PostgreSQL session, we’ll need to execute the following step

stmt <- paste(readLines(here::here('sql/wrk_fromAtoB.sql')), collapse = '\n')
dbExecute(con, "DROP FUNCTION wrk_fromAtoB(text, text, integer, integer)")
## [1] 0
dbExecute(con, stmt)
## [1] 0

Now, let’s load up our harbor seal example included in crawl and put all of our work into action.

data(harborSeal_sf)

harborSeal_sf <- harborSeal_sf %>% sf::st_transform(3338)

##Fit model as given in Johnson et al. (2008) Ecology 89:1208-1215
## Start values for theta come from the estimates in Johnson et al. (2008)
fixPar = c(log(250), log(500), log(1500), rep(NA,5), 0)
displayPar( mov.model=~1, err.model=list(x=~Argos_loc_class-1),data=harborSeal_sf, 
            activity=~I(1-DryTime),fixPar=fixPar)
##                  ParNames   fixPar thetaIdx
## 1 ln tau Argos_loc_class0 5.521461       NA
## 2 ln tau Argos_loc_class1 6.214608       NA
## 3 ln tau Argos_loc_class2 7.313220       NA
## 4 ln tau Argos_loc_class3       NA        1
## 5 ln tau Argos_loc_classA       NA        2
## 6 ln tau Argos_loc_classB       NA        3
## 7    ln sigma (Intercept)       NA        4
## 8     ln beta (Intercept)       NA        5
## 9                  ln phi 0.000000       NA
constr=list(
  lower=c(rep(log(1500),3), rep(-Inf,2)),
  upper=rep(Inf,5)
)

set.seed(123)
fit1 <- crwMLE(
  mov.model=~1, err.model=list(x=~Argos_loc_class-1), activity=~I(1-DryTime),
  data=harborSeal_sf,  Time.name="Time", 
  fixPar=fixPar, theta=c(rep(log(5000),3),log(3*3600), 0),
  constr=constr, method="L-BFGS-B",
  control=list(maxit=2000, trace=1, REPORT=1)
)
## iter    1 value 41202.609844
## iter    2 value 41056.998394
## iter    3 value 40756.499671
## iter    4 value 40146.411888
## iter    5 value 40066.268387
## iter    6 value 40005.050128
## iter    7 value 40001.092459
## iter    8 value 40001.050324
## iter    9 value 40001.048541
## iter   10 value 40001.048534
## final  value 40001.048534 
## converged
print(fit1)
## 
## 
## Continuous-Time Correlated Random Walk fit
## 
## Models:
## --------
## Movement   ~ 1
## Error   ~Argos_loc_class - 1
## 
## 
##                         Parameter Est. St. Err. 95% Lower 95% Upper
## ln tau Argos_loc_class0          5.521        .         .         .
## ln tau Argos_loc_class1          6.215        .         .         .
## ln tau Argos_loc_class2          7.313        .         .         .
## ln tau Argos_loc_class3          7.313    0.182     6.957     7.669
## ln tau Argos_loc_classA          7.313    0.076     7.165     7.462
## ln tau Argos_loc_classB          7.313    0.091     7.135     7.492
## ln sigma (Intercept)             8.421     0.03     8.363      8.48
## ln beta (Intercept)             -0.122    0.149    -0.414      0.17
## ln phi                           0.000        .         .         .
## 
## 
## Log Likelihood = -20000.524 
## AIC = 40011.049
pred1 = crwPredict(fit1, predTime = '1 hour')
pred1_sf <- pred1 %>% crw_as_sf("POINT","p")

We’ll create a get_land_segments() function to return a data.frame of all the track points that fall wihtin our provided land_mask.

get_land_segments = function(crw_sf, land_mask) {
  on_mask <- sf::st_intersects(crw_sf, land_mask) %>% 
    purrr::map_lgl(~ length(.x) > 0)
  
  in.segment <- (on_mask == TRUE)
  
  start_idx <- which(c(FALSE, in.segment) == TRUE &
                       dplyr::lag(c(FALSE, in.segment) == FALSE)) - 2
  end_idx <- which(c(in.segment, FALSE) == TRUE & 
                     dplyr::lead(c(in.segment, FALSE) == FALSE)) + 1
  on_mask_segments <- data.frame(sid = 1:length(start_idx),
                                 start_idx, end_idx) %>%
    dplyr::mutate(n_pts = end_idx-start_idx-1) %>% 
    rowwise() %>% 
    dplyr::mutate(start_pt = crw_sf[start_idx, ] %>% st_geometry() %>% 
                    sf::st_as_text(EWKT = TRUE),
                  end_pt = crw_sf[end_idx, ] %>% st_geometry() %>% 
                    sf::st_as_text(EWKT = TRUE)) %>% 
    dplyr::ungroup()
  
  return(on_mask_segments)
}

Here, segs_tbl is a data.frame of all our on-land segments. We will need access to this data within our database, so we’ll push it up as a temporary table (it will only be available for this database connection/session).

segs_tbl <- get_land_segments(pred1_sf,land_barrier)
dbWriteTable(con,"segs_tbl",segs_tbl,temporary = TRUE, overwrite = TRUE)

Now, we’ll finish up by using our wrk_fromAtoB() function on the database to calculate the shortest path through our network (in-water corridor). To speed things up, we’ll limit the network search area to a 50km buffer around a bounding box that includes our start and end points. This can be adjusted if needed. But, 50km seemed a good default value.

stmt <- "select (wrk_fromAtoB(s.start_pt, s.end_pt, s.sid, 50000)).* from
(select start_pt, end_pt, sid from segs_tbl) s"
# bench::mark(
segs_tbl <- sf::st_read(con,query = stmt) %>% 
  dplyr::inner_join(segs_tbl, by = c("sid" = "sid"))
# )

We can use the mapview package to view and zoom in on each of these re-routed segments.

mapview::mapview(pred1 %>% crw_as_sf("LINESTRING","p"), 
                 color = "grey", layer.name = "predicted track") +
  mapview::mapview(segs_tbl, layer.name = "re-routed segments")